Generation of a high band extension of a bandwidth extended audio signal

ABSTRACT

An audio decoder configured to generate a high band extension of an audio signal from an envelope and an excitation. The audio decoder includes a control arrangement configured to jointly control envelope shape and excitation noisiness with a common control parameter (f).

TECHNICAL FIELD

The proposed technology relates to generation of a high band extensionof a bandwidth extended audio signal.

BACKGROUND

Most existing telecommunication systems operate on a limited audiobandwidth. Stemming from the limitations of the land-line telephonysystems, most voice services are limited to only transmitting the lowerend of the spectrum. Although the audio bandwidth is enough for mostconversations, there is a desire to increase bandwidth to improveintelligibility and sense of presence. Although the capacity intelecommunication networks is continuously increasing, it is still ofgreat interest to limit the required bandwidth per communicationchannel. In mobile networks smaller transmission bandwidths for eachcall yields lower power consumption in both the mobile device and thebase station. This translates to energy and cost savings for the mobileoperator, while the end user will experience prolonged battery life andincreased talk-time. Further, with less consumed bandwidth per user themobile network can service a larger number of users in parallel.

A property of the human auditory system is that the perception isfrequency dependent. In particular, our hearing is less accurate forhigher frequencies. This has inspired so called bandwidth extension(BWE) techniques, where a high frequency band is reconstructed from alow frequency band using limited resources.

The conventional BWE uses a representation of the spectral envelope ofthe extended high band signal, and reproduces the spectral finestructure of the signal by using a modified version of the low bandsignal. If the high band envelope is represented by a filter, the finestructure signal is often called the excitation signal. An accuraterepresentation of the high band envelope is perceptually more importantthan the fine structure. Consequently, it is common that the availableresources in terms of bits are spent on the envelope representationwhile the fine structure is reconstructed from the coded low band signalwithout additional side information. The basic concept of BWE isillustrated in FIG. 1.

The technology of BWE has been applied in a variety of audio codingsystems. For example, the 3GPP AMR-WB+, [1], uses a time domain BWEbased on a low band coder which switches between Code Excited LinearPredictor (CELP) speech coding and Transform Coded Residual (TCX)coding. Another example is the 3GPP eAAC transform based audio codecwhich performs a transform domain variant of BWE called Spectral BandReplication (SBR), [2]. Here, the excitation is created using a mixtureof tonal components generated from the low-band excitation and a noisesource in order to match the tonal to noise ratio of the input signal.In general, the noisiness of the signal can be described as a measure ofhow flat the spectrum is, e.g. using a spectral flatness measure. Thenoisiness can also be described as non-tonality, randomness ornon-structure of the excitation. Increasing the noisiness of a signal isto make it more noise-like by e.g. mixing the signal with a noise signalfrom e.g. a random number generator or any other noise source. It canalso be done by modifying the spectrum of the signal to make it moreflat.

The spectral fine structure from the low band may be very different fromthe fine structure found in the high band. In particular, thecombination of an excitation generated from the low band signal togetherwith the high band envelope may produce undesired artifacts as residingharmonicity or shape of the excitation may be emphasized by the envelopeshaping in an uncontrolled way. As a safety measure, it is common toflatten the high band envelope in order to limit undesired interactionbetween the excitation and the envelope. Although this solution may givea reasonable trade-off, the flatter envelope may be perceived as morenoisy and the high band envelope will be less accurate.

SUMMARY

An object of the proposed technology is an improved control of thegeneration of the high band extension of a bandwidth extended audiosignal.

This object is achieved in accordance with the attached claims.

A first aspect of the proposed technology involves a method ofgenerating a high band extension of an audio signal from an envelope andan excitation. The method includes the step of jointly controllingenvelope shape and excitation noisiness with a common control parameter.

A second aspect of the proposed technology involves an audio decoderconfigured to generate a high band extension of an audio signal from anenvelope and an excitation. The audio decoder includes a controlarrangement configured to jointly control envelope shape and excitationnoisiness with a common control parameter.

A third aspect of the proposed technology involves a user equipment (UE)including an audio decoder in accordance with the second aspect.

A fourth aspect of the proposed technology involves an audio encoderincluding a spectral flatness estimator configured to determine, fortransmission to a decoder, a measure of spectral flatness of a high bandsignal.

The proposed technology allows a more pronounced envelope structurewhich masks perceptual artifacts created by artificially generated highband excitations. At the same time joint control of envelope structureand noisiness of the excitation improves naturalness of thereconstructed audio signal.

BRIEF DESCRIPTION OF THE DRAWINGS

The proposed technology, together with further objects and advantagesthereof, may best be understood by making reference to the followingdescription taken together with the accompanying drawings.

FIG. 1 illustrates the basic concept of the BWE technique in the form ofa frequency spectrum. The coded low band signal is extended with a highband using a high band envelope and an excitation signal which isgenerated from the low band signal.

FIG. 2 illustrates an example BWE system with a CELP codec for the lowband and where the upper band is reconstructed using a Linear Predictor(LP) envelope and an excitation signal which is generated from modifiedoutput parameters of the CELP decoder.

FIG. 3 illustrates an example BWE decoder which has a correspondingencoder as shown in FIG. 2. The modulated excitation is mixed with anoise signal from a noise generator.

FIG. 4 illustrates an example embodiment of the proposed technology in aCELP decoder system with a joint control arrangement for the excitationmixing and spectral shape.

FIG. 5 illustrates an example of an input LP spectrum and an LP spectrumwhich has been emphasized with a post-filter.

FIG. 6 illustrates an example embodiment of an encoder using a spectralflatness analysis based on Linear Predictive Coding (LPC) coefficients.

FIG. 7 illustrates an example embodiment of a decoder corresponding tothe encoder in FIG. 6 which uses the transmitted flatness parameter forjoint spectral envelope and excitation structure control.

FIG. 8 illustrates an example of a transform based audio codec which hasa joint envelope encoding for the entire spectrum and employs BWEtechniques to obtain the spectral fine structure of the high band.

FIG. 9 illustrates an example of a BWE decoder belonging to acorresponding encoder as shown in FIG. 8. The modulated excitation ismodified using a compressor to get a flatter fine structure in the highband excitation.

FIG. 10 illustrates an example embodiment of the proposed technology ina transform based decoder system with a joint controller for excitationcompression and envelope expansion.

FIG. 11 illustrates an example embodiment of an encoder which has alocal decoding unit and a low band error estimator.

FIG. 12 illustrates an example embodiment of the proposed technology ina transform based decoder system with a joint control arrangement forexcitation compression and envelope expansion, where the joint controlis adapted using the low band error estimate from the encoder.

FIG. 13 illustrates an example embodiment of a control arrangement.

FIG. 14 illustrates a User Equipment (UE) including a decoder providedwith a control arrangement.

FIG. 15 is a flow chart illustrating the proposed technology.

FIG. 16 is a flow chart illustrating an example embodiment of theproposed technology.

FIG. 17 is a flow chart illustrating an example embodiment of theproposed technology.

FIG. 18 is a flow chart illustrating an example embodiment of theproposed technology.

FIG. 19 is a flow chart illustrating an example embodiment of theproposed technology.

DETAILED DESCRIPTION

In the following detailed description blocks performing the same orsimilar functions have been provided with the same referencedesignations.

The proposed technology may be used both in time domain BWE andfrequency domain BWE. Example embodiments for both will be given below,

Time Domain BWE

An example embodiment of a prior art BWE mainly intended for speechapplications is shown in FIG. 2. This example uses a CELP speechencoding algorithm for the low band of the input signal. The high bandenvelope is represented with an LP filter. The synthesis of the highband is created by using a modified version of the low band excitationsignal extracted from the CELP synthesis.

Each input signal frame y is split into a low frequency band signaly_(L) and a high frequency band signal y_(H) using an analysis filterbank 10. Any suitable filter bank may be used, but it would essentiallyconsist of a low-pass and a high-pass filter, e.g. a Quadrature MirrorFilter (QMF) filter bank. The low band signal is fed to a CELP encodingalgorithm performed in a CELP encoder 12. LP analysis is conducted onthe high band signal in an LP analysis block 14 to obtain arepresentation A of the high band envelope. The LP coefficients definingA are encoded with an LP quantizer or LP encoder 16, and thequantization indices I_(LP) are multiplexed in a bitstream mux(multiplexer) 18 together with the CELP encoder indices I_(CELP) to bestored or transmitted to a decoder. The decoder in turn demultiplexesthe indices I_(LP) and I_(CELP) in a bitstream demux (de-multiplexer)20, and forwards them to the LP decoder 22 and the CELP decoder 24,respectively. In the CELP decoding the CELP excitation signal x_(L) isextracted and processed such that the frequency spectrum is modulated togenerate the high band excitation signal x_(H).

There exists a variety of modulation schemes to create a high bandexcitation x_(H) from a low band excitation signal x_(L) in anexcitation processor 26. For example, reversing the spectrum guaranteesthat the properties of the signal are similar in the crossover regionbetween low band and high band, but the high end of the high band signalmay have undesired properties. Other ways of generating a high bandexcitation is to perform other types of modulation which may or may notpreserve the harmonic structure of a series of harmonics. The excitationsignal may be taken from only a part of the low band or even adaptivelyby searching the low band for suitable parts to be used to form the highband excitation signal. The latter approach may also require thatparameters are encoded such that the decoder may identify the regionsused in the high band excitation.

The modulated excitation x_(H) is filtered using the high band LP filter1/Â to form the high band synthesis ŷ_(H). This is done in an LPsynthesis block 28. The output ŷ_(L) of the CELP decoder is joined withthe high band synthesis ŷ_(H) in synthesis filter bank 30 to form theoutput signal ŷ.

In FIG. 2 and the following figures the lines to and from the bitstreammux 18 and bitstream demux 20, respectively, have been dashed toindicate that they transfer indices representing quantized quantitiesrather than the actual values of the quantized quantities.

The excitation from the low band may have properties that are notsuitable to be used as high band excitation. For instance, the low bandsignal often contains strong harmonic structure which gives annoyingartifacts when transferred to the high band. One prior art solution tocontrol the excitation structure is to mix the low band excitationsignal with noise. An example decoder of such a system is shown in FIG.3. Here, the high band LP filter coefficients Â are decoded and the CELPdecoder 24 is run while extracting the excitation signal just asdescribed in FIG. 2. However, the modulated excitation x_(H) is alsomixed, as illustrated by multipliers 32, 34 and an adder 36, with aGaussian noise signal n from a noise generator 38 using respectivemixing factors g_(x)(i) and g_(n)(i) for each subframe i, i.e.:

{tilde over (x)} _(i) =g _(x)(i)x _(H,i) +g _(n)(i)n _(i)  (1)

Here x_(H,i) represents the samples x_(H) of subframe i, such thatx_(H)=[x_(H,1) x_(H,2) . . . x_(H,Nsub)], where Nsub is the number ofsubframes. In this example Nsub=4. It may further be beneficial to adaptthe temporal shape of the noise signal n such that it matches thetemporal shape of x_(H).

In this example the mixing factors are determined in a mix controller 40and are based on a voicing parameter ν(i) of each subframe i of the CELPcodec:

$\begin{matrix}\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{v(i)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {v(i)}} \right)}/E_{2}}}\end{matrix} \right. & (2)\end{matrix}$

where E₁ and E₂ are the frame energies of x_(H) and n, respectively,i.e.:

$\begin{matrix}\left\{ \begin{matrix}{E_{1} = {\sum\limits_{k = 0}^{L - 1}{x_{H}^{2}(k)}}} \\{E_{2} = {\sum\limits_{k = 0}^{L - 1}{n^{2}(k)}}}\end{matrix} \right. & (3)\end{matrix}$

where the current frame is represented with samples k=0, 1, 2, . . . ,L−1. The voicing parameter ν(i) influences the balance of the noisecomponent n and the modulated excitation x_(H) and may e.g. be in theinterval ν(i)ε[0,1]. The voicing parameter expresses the signalperiodicity (or tonality or harmonicity) and is computed from the energyE_(ACB) of the algebraic codebook and the energy E_(FCB) of the fixedcodebook of the CELP codec, for example in accordance with:

$\begin{matrix}{{{v(i)} = {0.5\left( {1 - {r_{v}(i)}} \right)}}{where}} & (4) \\{{r_{v\;}(i)} = \frac{{E_{v}(i)} - {E_{C}(i)}}{{E_{v}(i)} + {E_{C}(i)}}} & (5)\end{matrix}$

where E_(ν)(i) and E_(C)(i) are the energies of the scaled pitch codevector and scaled algebraic code vector for subframe i.

The mixed excitation {tilde over (x)}_(H) is filtered in LP synthesisblock 28 using the high band LP filter 1/Â to form the high bandsynthesis ŷ_(H). The output ŷ_(L) of the CELP decoder is joined with thehigh band synthesis ŷ_(H) in synthesis filter bank 30 to form the outputsignal ŷ.

An example embodiment of a time domain BWE based on the technologyproposed herein focuses on an audio encoder and decoder system mainlyintended for speech applications. This embodiment resides in the decoderof an encoding and decoding system as outlined in FIG. 2 and with anexcitation noise mixing system as described in FIG. 3. The addition tothe prior art systems is an additional control on both the spectralenvelope and the excitation mixing by jointly controlling envelope shapeand excitation noisiness with a common control (or shared) parameter f,as exemplified in the decoder 200 in FIG. 4. The control parameter f is“common” in the sense that the same control parameter f is used tocontrol both envelope shape and excitation noisiness. In this example asingle control parameter fε[0,1] is used. It should, however, be notedthat any interval of the control parameter may be used, e.g. [−A,A],[0,A], [A,0] or [A,B] for any suitable A and B. However, there is abenefit of having a simple unit interval for the purpose of controllingtwo or more processes jointly.

The control of the spectral envelope may, for example, be done using aformant post-filter H(z) (illustrated at 42 in FIG. 4) of the form:

$\begin{matrix}{{H(z)} = \frac{\hat{A}\left( {z/\gamma_{1}} \right)}{\hat{A}\left( {z/\gamma_{2}} \right)}} & (6)\end{matrix}$

where

-   -   Â is a linear predictor filter representing the envelope, and    -   γ₁, γ₂ are functions of the control parameter f.

This post-filter 42 is typically used for cleaning spectral valleys in aCELP decoder, and is controlled by a joint post-filter and excitationcontroller 44. An example of the spectrum envelope emphasis obtainedwith such a post-filter can be seen in FIG. 5. In this exampleembodiment the filter 42 is made adaptive by modifying γ₁, γ₂ using thecontrol parameter f in accordance with:

$\begin{matrix}\left\{ \begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma}}}\end{matrix} \right. & (7)\end{matrix}$

where γ₀, Δγ are predetermined constants. Suitable values for γ₀ may beγ₀=0.75 or in the range γ₀ε[0.5,0.9], and suitable values for Δγ may beΔγ=0.15 or in the range Δγε[0.1,0.3]. Note however that γ₀ and Δγ mustbe chosen such that γ₁ε[0,1] and γ₂ε[0,1]. With this setup, the controlvalue f=1 will give the strongest modification from the post-filterwhile f=0 will disable the post-filter by setting γ₁=γ₂ which yieldsH(z)=1.

In another variant of the post-filter 42 the idle state of the filterfor f=0 is modified to have a flattening effect on the spectrum. Thismay be useful for situations where the initial spectrum has too muchstructure, such that a disabling of the post-filter is not enough toachieve the desired amount of spectral valley de-emphasis. In that casethe expression in equation (7) can be modified as:

$\begin{matrix}\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} - \gamma_{\exp} + {{f \cdot \Delta}\; \gamma}}} \\{\gamma_{2} = {\gamma_{0} + \gamma_{\exp} - {{f \cdot \Delta}\; \gamma}}}\end{matrix}{or}} \right. & (8) \\\left\{ \begin{matrix}{\gamma_{1} = {\gamma_{0} - \gamma_{\exp} + {f \cdot \left( {{\Delta\gamma} + \gamma_{\exp}} \right)}}} \\{\gamma_{2} = {\gamma_{0} + \gamma_{\exp} - {f \cdot \left( {{\Delta\gamma} + \gamma_{\exp}} \right)}}}\end{matrix} \right. & (9)\end{matrix}$

where the equation (9) implicitly accounts for the flattening filteroffset. Note that f=0 in this case generates γ₁<γ₂ which means thepost-filter 42 has a flattening effect rather than emphasizing effect onthe shape of the envelope.

The flattening effect may also be achieved by extending the range of thecontrol parameter f to e.g. fε[−1,1] or fε[A, A] or fε[A, B] forsuitable values of A and B. In this case, the post-filter 42 may beexpressed as in equation (7) such that a negative f gives a flatteningeffect to the spectral envelope while a positive f enhances the spectralenvelope structure. It may also be desirable to use differentpost-filter strengths for the spectral structure emphasis and spectralflattening, respectively. One such method would be to use a different Δγdepending on the sign of the control parameter f.

$\begin{matrix}\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} + {f \cdot {\Delta\gamma}_{sharp}}}} \\{\gamma_{2} = {\gamma_{0} - {f \cdot {\Delta\gamma}_{sharp}}}}\end{matrix},{f \geq {0\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} + {f \cdot {\Delta\gamma}_{flat}}}} \\{\gamma_{2} = {\gamma_{0} - {f \cdot {\Delta\gamma}_{flat}}}}\end{matrix},{f < 0}} \right.}}} \right. & (10)\end{matrix}$

where Δγ_(flat) and Δγ_(sharp) are predetermined constants which controlthe strength of the flattening and spectral enhancing strength,respectively. Suitable values may be Δγ_(flat)=0.12 or in the rangeΔγ_(flat)ε[0.01,0.20] and Δγ_(sharp)=0.08 or in the rangeΔγ_(sharp)ε[0.01,0.20].

The excitation mixing is in turn controlled by a mix controller 41configured to control the excitation noisiness by mixing the high bandexcitation x_(H,i) of subframe i with noise n_(i) in accordance with(1), where the mixing factors g_(x)(i) and g_(n)(i) are defined by:

$\begin{matrix}\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{{v(i)}\left( {1 - {\alpha \; f}} \right)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {{v(i)}\left( {1 - {\alpha \; f}} \right)}} \right)}/E_{2}}}\end{matrix} \right. & (11)\end{matrix}$

where

-   -   ν(i) is a voicing parameter partially controlling the excitation        noisiness,    -   α is a predetermined tuning constant,    -   E₁ is the frame energy of the high band excitations x_(H,i) for        all subframes i, and    -   E₂ is the frame energy of the noise n_(i) for all subframes i.

The tuning constant α decides the maximum modification compared toequation (2). A suitable value for α may be α=0.3 or in the rangeαε[0,1]. When the control parameter f is close to 1 the mixing factorswill be balanced to give more noise, while f close to 0 will give theunmodified noise proportion in the mix.

If negative values of the control parameter f are permitted, analternative expression for the noise mixing factors generated by mixcontroller 41 is

$\begin{matrix}\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{{v(i)}\left( {1 - {\max \left( {0,{\alpha \; f}} \right)}} \right)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {{v(i)}\left( {1 - {\max \left( {0,{\alpha \; f}} \right)}} \right)}} \right)}/E_{2}}}\end{matrix} \right. & (12)\end{matrix}$

where

-   -   ν(i) is a voicing parameter partially controlling the excitation        noisiness,    -   α is a predetermined tuning constant,    -   E₁ is the frame energy of the high band excitations x_(H,i) for        all subframes i, and    -   E₂ is the frame energy of the noise n_(i) for all subframes i.

Here the function max(a,b) returns the maximum value of a and b asdefined in equation (14) below. In the expression above this ensuresthat a negative f does not influence the noise mixing values.

In an embodiment the control parameter f may be adapted by usingparameters already present in the decoder 200. One example is to use thespectral tilt of the high band signal, since the post-filter 42 may beharmful in combination with a strong spectral tilt. Thus, the jointpost-filter and excitation controller 44 may be configured adapt thecontrol parameter f to a high band spectral tilt t_(m) of frame m. Thehigh band spectral tilt may be approximated using the second coefficienta_(1,m) of the decoded LP filter Â_(m)={1, a_(1,m), a_(2,m), . . . ,a_(P,m)} of the current frame m, where P is the filter order.

It is generally beneficial to smoothen the adaptation to avoid creatingabrupt changes in the spectral envelope, for example in accordance with:

t _(m) =β·a _(1,m)+(1−β)max(0,t _(m-1))  (13)

where t_(m) is the spectral tilt value of frame m, t_(m-1), is thespectral tilt value of the previous frame m−1 and β=0.1 or in the rangeβ=[0,0.5]. The max function may be defined as:

$\begin{matrix}{{\max \left( {a,b} \right)} = \left\{ \begin{matrix}{a,} & {a \geq b} \\{b,} & {a < b}\end{matrix} \right.} & (14)\end{matrix}$

Here the max function ensures the spectral tilt value used from theprevious frame is not negative. Other examples for smoothing thespectral tilt are:

t _(m)=β·max(0,a _(1,m))+(1−β)t _(m-1)  (15)

and

t _(m) =β·a _(1,m)+(1−β)t _(m-1)  (16)

It may also be desirable to consider both negative and positive spectraltilts. In this case the absolute value of the spectral tiltapproximation may be used, i.e.:

t _(m) =β·|a _(1,m)|+(1−β)t _(m-1)  (17)

The smoothened spectral tilt value can be mapped to the controlparameter f with a piece-wise linear function:

$\begin{matrix}{{f\left( t_{m} \right)} = \left\{ \begin{matrix}{0,} & {t_{m} \geq C_{\max}} \\{{1 - {\left( {t_{m} - C_{\min}} \right)/\left( {C_{\max} - C_{\min}} \right)}},} & {C_{\min} \leq t_{m} < C_{\max}} \\{1,} & {t_{m} < C_{\min}}\end{matrix} \right.} & (18)\end{matrix}$

where C_(min) and C_(max) are predetermined constants. In this examplethe constant values are set to C_(max)=0.8 and C_(min)=0.4, but othersuitable values may be chosen from C_(max)ε[0.5,2.0] andC_(min)ε[0,C_(max)].

Returning to FIG. 4, using the modified g_(x) and g_(n) a new excitationsignal {tilde over (x)}_(H) is obtained. This signal is filtered usingthe high band LP filter 1/Â (at 28) to form a first stage high bandsynthesis y′_(H). This signal is fed to the adaptive post-filter H(z)(at 42) to obtain the high band synthesis {tilde over (y)}_(H). Theoutput ŷ_(L) of the CELP decoder 24 is combined with the high bandsynthesis {tilde over (y)}_(H) in the synthesis filter bank 30 to formthe output signal ŷ.

Other alternatives exist to the tilt-based adaptation described above.For example, a measure of the spectral flatness of the high band may beused. The spectral flatness φ is measured on some representation of thehigh band spectrum. It may, for example, be derived from the high bandLPC coefficients A using the well-known expression:

$\begin{matrix}{{\phi = \frac{^{\frac{1}{N}{\sum\limits_{i = 0}^{N - 1}{\log {(X_{i})}}}}}{\frac{1}{N}{\sum\limits_{i = 0}^{N - 1}X_{i}}}}{where}} & (19) \\{{X_{i} = \frac{1}{{{{DFT}\left( {A,M} \right)}}^{2}}},{i = 0},1,2,\ldots \mspace{14mu},{N - 1}} & (20)\end{matrix}$

where DFT(A,M) denotes the discrete Fourier transform of length M of theLPC coefficients A. The expression |·| denotes the magnitude of thecomplex transform values (the dot represents a mathematical expression),and due to the symmetry of the transform only the first N=M/2 values areconsidered. This transform is preferably implemented with an FFT(Fast-Fourier Transform) and the M would be the nearest higher power of2 to the filter length P+1, i.e. M=2^(┌log) ² ^((P+1)┐).

If P+1>M, the input filter A is padded with zeroes before the FFT isperformed. The spectral flatness φ may also be calculated using thequantized LPC coefficients Â. If this is done, the spectral flatnessmeasure may be calculated in the decoder without additional signaling.In this case the system can be described by FIG. 4, provided that A issubstituted with Â in equation (20).

It may be desirable to determine the spectral flatness measure on theencoder side to reduce the overall complexity when considering bothencoder and decoder. In such an embodiment the encoder includes aspectral flatness estimator configured to determine, for transmission toa decoder, a measure of spectral flatness of the high band signal. Anencoder using a spectral flatness estimator 46 based on the LPCcoefficients is depicted in FIG. 6. In this case, the flatness measuremust be signaled in the bit-stream. The signaling may consist of abinary decision {circumflex over (φ)}ε{0,1} whether the spectralflatness is considered high or low depending on a threshold valueφ_(thr).

$\begin{matrix}\left\{ \begin{matrix}{{\hat{\phi} = 0},} & {\phi \geq \phi_{thr}} \\{{\hat{\phi} = 1},} & {\phi < \phi_{thr}}\end{matrix} \right. & (21)\end{matrix}$

The corresponding control parameter f may, for example, be derived usingthe binary decision {circumflex over (φ)}, i.e. f=1-2{circumflex over(φ)}.

With the above definitions, the control parameter f will be 1 forflatness values above the threshold and −1 for flatness values below thethreshold. To limit the influence of the abrupt switching between thesevalues, the control parameter may further be smoothened using e.g. aforgetting factor β in a similar way as for the tilt filtering:

f′ _(m) =β·f _(m)+(1−β)·f′ _(m-1)  (22)

A decoder 200 corresponding to the encoder in FIG. 6 is shown in FIG. 7.It is similar to the decoder in FIG. 4. However, in FIG. 7 the jointpost-filter and excitation controller 44 determines the controlparameter f based on the received binary decision {circumflex over (φ)}instead of the linear predictor filter Â representing the envelope.Generally, the control parameter f is adapted to a measure of spectralflatness (φ) of the high band.

It should be noted that other processing stages may be possible beforethe synthesis filter 1/Â or before or after the post-filter H(z). Onesuch processing stage could be a temporal shaping procedure which aimsto reconstruct the temporal structure of the original high band signal.Such temporal shaping may be encoded using a gain-shape vectorquantization representing gain correction factors on a subframe level.Part of the temporal shaping will also be inherited from the low bandexcitation signal which is partly used as a base for the high bandexcitation signal.

The post-filter and excitation mixing may also affect the energy of thesignals. Keeping the energy stable is desirable and there are manyavailable methods for handling this. One possible solution is to measurethe energy before and after the modification and restore the energy tothe value before excitation mixing and post-filtering. The energymeasurement may also be limited to a certain band or to the higherenergy regions of the spectrum, allowing energy loss in the valleys ofthe spectrum. In this example embodiment energy compensation may be usedas an integral part of the mixing and post-filter functions.

Frequency Domain BWE

Frequency transform based audio coders are often used for general audiosignals such as music or speech with background noises or reverberation.At low bitrates they generally show poor performance. One common priorart solution is to lower the bandwidth to obtain acceptable quality fora narrower band and apply BWE for the higher frequencies. An overview ofsuch a system is shown in FIG. 8.

The input audio is first partitioned into time segments or frames as apreparation step for the frequency transform. Each frame y istransformed to frequency domain to form a frequency domain spectrum Y.This may be done using any suitable transform, such as the ModifiedDiscrete Cosine Transform (MDCT), the Discrete Cosine Transform (DCT) orthe Discrete Fourier Transform (DFT). The frequency spectrum ispartitioned into shorter row vectors denoted Y(b). These functions areperformed by a frequency transformer 50. Each vector now represents thecoefficients of a frequency band b out of a total number of bands N_(b).From a perceptual perspective is beneficial to partition the spectrumusing a non-uniform band structure which follows the frequencyresolution of the human auditory system. This generally means thatnarrow bandwidths are used for low frequencies while larger bandwidthsare used for high frequencies.

Next, the norm of each band is calculated in an envelope analyzer 52 toform a sequence of gain values E(b) which form the spectral envelope.These values are then quantized using an envelope encoder 54 to form thequantized envelope Ê(b). The envelope quantization may be done using anyquantizing technique, e.g. differential scalar quantization or anyvector quantization scheme. The quantized envelope coefficients Ê(b) areused to normalize the band vectors Y (b) in an envelope normalizer 56 toform corresponding normalized shape vectors X (b):

$\begin{matrix}{{X(b)} = {\frac{1}{\hat{E}(b)}{Y(b)}}} & (23)\end{matrix}$

The sequence of normalized shape vectors X (b) constitutes the finestructure of the spectrum. The perceptual importance of the spectralfine structure varies with the frequency but may also depend on othersignal properties such as the spectral envelope signal. Transform codersoften employ an auditory model to determine the important parts of thefine structure and assign the available resources to the most importantparts. The spectral envelope is often used as input to this auditorymodel and the output is typically a bit assignment for the each of thebands corresponding to the envelope coefficients. Here, a bit allocationalgorithm in a bit allocator 58 uses the quantized envelope Ê(b) incombination with an internal auditory model to assign a number of bitsR(b) which in turn are used by a fine structure encoder 60. When thetransform coder is operated at low bitrates, some of the bands will beassigned zero bits and the corresponding shape vectors will not bequantized. The indices I_(E) and I_(X) from the quantization of theenvelope and the encoded fine structure vectors, respectively, aremultiplexed in a bitstream mux (multiplexer) 62 to be stored ortransmitted to a decoder.

The decoder demultiplexes the indices from the communication channel orthe stored media in a bitstream demux (de-multiplexer) 70 and forwardsthe indices I_(X) to a fine structure decoder 72 and I_(E) to anenvelope decoder 74. The quantized envelope Ê(b) is obtained and fed tothe bit allocation algorithm in a bit allocator 76 in the decoder, whichgenerates the bit allocation R(b). Using R(b), the band with the highestnon-zero value in the bit allocation is found. This band is denoted b.

The fine structure decoder 72 uses the fine structure indices I_(X) andthe bit allocation R(b) to produce the quantized fine structure vectors{circumflex over (X)}_(L) (b), which are defined for b=1, 2, . . . ,b_(max).

In this example embodiment the crossover frequency is adaptive dependingon the bit allocation and starts from the band b_(max)+1, given theconstraint that b_(max)+1≦N_(b).

There may be bands b<b_(max) which have zero bits assigned. Inparticular for low bitrates it is common that such zero-bit bands appearand due to variations in the spectrum the positions of the zero-bitbands usually vary from frame to frame. Such variations cause modulationeffects in the synthesis. Typically the zero-bit bands are handled withspectral filling techniques, where signals are injected in the zero-bitbands. The filling signal may be a pseudo-random noise signal or amodified version of the coded bands. The filling technique is not anessential part of this technology and it is assumed that a suitablespectral filling is part of the fine structure decoder 72. After thespectral filling has been done, the low band fine structure {circumflexover (X)}_(L)(b) is input to a low frequency envelope shaper 78, whichrestores the synthesized low band spectrum Ŷ_(L)(b) in accordance with:

Ŷ _(L)(b)={circumflex over (X)} _(L)(b)·Ê(b),b=1,2, . . . b _(max)  (24)

The low band fine structure {circumflex over (X)}_(L) (b) is also inputto a fine structure modifier or processor 80, which identifies thelength of the low band structure from the parameter b_(max) and createsa high band excitation signal {circumflex over (X)}_(H)(b) defined forb_(max)+1, b_(max)+2, . . . , N_(b). There are many techniques forcreating a high band excitation from the low band excitation. In thisexample embodiment, the upper half of the low band excitation is foldedand duplicated to fill the high band excitation. Assume that {circumflexover (X)}_(LH) represents the upper half of the low band excitationsignal and that the function rev(.) reverses the elements of a vector.Then the sequence [rev({circumflex over (X)}_(LH)) {circumflex over(X)}_(LH) rev({circumflex over (X)}_(LH)) Ĥ_(LH), . . . ] is repeatedfor as many times as needed to fill the high band excitation spectrum{circumflex over (X)}_(H)(b), b_(max)+1, b_(max)+2, . . . , N_(b). Thehigh band excitation signal is then input to a high frequency envelopeshaper 82 to form the synthesized high band spectrum Ŷ_(H)(b) inaccordance with:

Ŷ(b)={circumflex over (X)} _(H)(b)·Ê(b),b=b _(max)+1,b _(max)+2, . . .,N _(b)  (25)

The synthesized low band spectrum Ŷ_(L)(b) and the synthesized high bandspectrum Ŷ_(H)(b) are combined in a spectrum combiner 84 to form thesynthesis spectrum Ŷ(b), or Ŷ with the band index omitted. The synthesisspectrum is input to the inverse frequency transformer 86 to form theoutput signal ŷ. In this process the necessary windowing and overlap-addoperations that are connected with the frequency transform are alsoconducted.

As was the case of the time domain BWE, the excitation from the low bandmay have properties that are not suitable to be used as high bandexcitation. In particular, one may wish to flatten out some of the finestructure in the low band excitation. A decoder of such an examplesystem is shown in FIG. 9. This prior art system assumes an encoder asoutlined in FIG. 8. The addition to the described scheme there is acompressor H (at 88) which operates on the high band excitation signal{circumflex over (X)}_(H)(b) to produce the compressed high bandexcitation signal {tilde over (X)}_(H)(b). One example compressorfunction is:

$\begin{matrix}{H = \left( \frac{\max\left( {{\hat{X}}_{H}} \right)}{{\hat{X}}_{H}} \right)^{\eta}} & (26)\end{matrix}$

which means H is a vector with the same length as {circumflex over(X)}_(H). Here the band index b has been omitted and the vectorrepresents all elements for the defined bands, i.e.:

{circumflex over (X)} _(H) =[{circumflex over (X)} _(H)(b _(max)+1)Ĥ_(H)(b _(max)+2) . . . {circumflex over (X)} _(H)(N _(b))]  (27)

The compression factor η is smaller than 1 and a suitable value may beη=0.5 or in the range ηε[0.01,0.99], where values close to 0 give noeffect and values close to 1 give maximum compression. The compressedhigh band synthesis is obtained by the element-wise multiplication of Hand {circumflex over (X)}_(H). It can be expressed as a matrixmultiplication:

{tilde over (X)} _(H) =Hdiag({circumflex over (X)} _(H))  (28)

where diag({circumflex over (X)}_(H)) produces a square matrix with{circumflex over (X)}_(H) on the diagonal. The compressed high bandexcitation {tilde over (X)}_(H)(b) is input to the high frequencyenvelope shaper 82 to form the high band spectrum Ŷ_(H)(b) in accordancewith:

Ŷ _(H)(b)={tilde over (X)} _(H)(b)·Ê(b),b=b _(max)+1,b _(max)+2, . . .,N _(b)  (29)

As illustrated in FIG. 9, the low band spectrum Ŷ_(L)(b) and the highband spectrum Ŷ_(H)(b) are combined in the spectrum combiner 84 to formthe synthesis spectrum Ŷ which is input to the inverse frequencytransformer 86 to form the output signal ŷ.

An example embodiment of a frequency domain BWE based on the proposedtechnology focuses on an audio encoder and decoder system mainlyintended for general audio signals. The new technology resides mainly inthe decoder of an encoding and decoding system as outlined in FIG. 8with an excitation compression system as illustrated in FIG. 9. Anexample embodiment of such a decoder 200 is illustrated in FIG. 10.

As an addition to the prior art there is provided a combined control ofa high band excitation compression which is jointly controlled with aspectral envelope expander 90 as shown in FIG. 10. As in the timedomain, a control parameter fε[0,1] is used for steering both thecompressor 88 and the expander 90. This is performed by a joint expanderand compressor controller 92.

The strength of the high band excitation compressor 88 is adapted usingthe control parameter f in accordance with:

$\begin{matrix}{H = \left( \frac{\max\left( {{\hat{X}}_{H}} \right)}{{\hat{X}}_{H}} \right)^{\eta + {\Delta \; {\eta \cdot f}}}} & (30)\end{matrix}$

where Δη gives the maximum compression factor exponent η+Δη when f=1. Ifη=0.5 then a suitable value for Δη may be Δη=0.3 or in the rangeΔηε[0.01,1−η]. Note that η+Δη≦1. The compressed high band excitation isobtained by the element-wise multiplication of H and {circumflex over(X)}_(H), i.e.:

{tilde over (X)} _(H) =Hdiag({circumflex over (X)} _(H))  (31)

The expander 90 used on the high band envelope has a similar structureas the high band excitation compressor:

$\begin{matrix}{{G = \left( \frac{\max\left( {\hat{E}(b)} \right)}{\hat{E}(b)} \right)^{- {({\varphi + {\Delta \; {\phi \cdot f}}})}}},{b = {b_{\max} + 1}},{b_{\max} + 2},\ldots \mspace{14mu},N_{b}} & (32)\end{matrix}$

Here the absolute value |·| may be omitted since the envelopecoefficients Ê(b)≧0. For f=0 the expander will have minimum effect withthe expansion coefficient φ. A suitable value for φ may be φ=0, sincethis would give an unaffected envelope for f=0. If a small expansioneffect is always desirable, suitable values may for instance be chosenfrom the range φε[0,0.5]. The maximum expansion is obtained for f=1,which gives the expansion factor exponent −(φ+Δφ). The value for Δφ maybe set to Δφ=1 but the suitable value would depend heavily on the bandstructure and may be chosen from a wide range, e.g. Δφε[0.5,10]. Theexpanded envelope {tilde over (E)}(b) is obtained by element-wisemultiplication of the envelope with the expansion function G, i.e.:

{tilde over (E)} _(H) =Gdiag(Ê _(H))  (33)

where Ê_(H) represents elements the high band envelopeÊ_(H)=[Ê(b_(max)+1) Ê(b_(max)+2) . . . Ê(N_(b))]. The expanded envelopeis applied to the compressed high band fine structure to form the highband spectrum Ŷ_(H)(b) in accordance with:

Ŷ _(H)(b)={tilde over (X)} _(H)(b)·{tilde over (E)}(b),b=b _(max)+1,b_(max)+2, . . . ,N _(b)  (34)

The synthesized low band spectrum {tilde over (Y)}_(L)(b) and thesynthesized high band spectrum Ŷ_(H)(b) are combined in the spectrumcombiner 84 to form the synthesis spectrum Ŷ which is input to theinverse frequency transformer 86 to form the output signal ŷ.

The joint control parameter f may be derived from parameters alreadyavailable in the decoder 200, or it may be based on an analysis done inthe encoder and transmitted to the decoder. Here, as for the time domainBWE case, we rely on an estimate on the high band spectral tilt. Such anestimate may be derived from the envelope parameters by measuring thequotient q_(m) of the sums of the envelope coefficients in each half ofthe high band signal, i.e.:

$\begin{matrix}{{q_{m} = \frac{\sum\limits_{b = {b_{\max} + 1}}^{b_{half}}{\hat{E}(b)}}{\sum\limits_{b = {b_{half} + 1}}^{N_{b}}{\hat{E}(b)}}}{where}} & (35) \\{b_{half} = {\left\lfloor {\left( {N_{b} - b_{\max}} \right)/2} \right\rfloor + b_{\max} + 1}} & (36)\end{matrix}$

The smoothing of the spectral tilt t_(m) for frame m may be done thesame way as in the time domain embodiment, e.g. using:

t _(m) =β·q _(m)+(1−β)t _(m-1)  (37)

The mapping of the spectral tilt to the control parameter f may also bedone using the same piece-wise linear function as in the time domainembodiment, i.e.:

$\begin{matrix}{{f\left( t_{m} \right)} = \left\{ \begin{matrix}{0,} & {t_{m} \geq C_{\max}} \\{{1 - {\left( {t_{m} - C_{\min}} \right)/\left( {C_{\max} - C_{\min}} \right)}},} & {C_{\min} \leq t_{m} < C_{\max}} \\{1,} & {t_{m} < C_{\min}}\end{matrix} \right.} & (38)\end{matrix}$

However, since the definition of the spectral tilt is different theconstants C_(max) and C_(min) of the mapping function will be different.These will for instance depend on the band structure.

In an alternative to the frequency domain embodiment described above,the joint envelope and excitation control is adapted to the low banderror signal which is estimated in the encoder, which is similar to theencoder in the system outlined in FIG. 8, but further has a localdecoding and error measurement unit. An example of such a system isshown in FIG. 11, wherein the local decoding and error measurement unitincludes a local decoder 96, a low frequency spectrum extractor 98, anadder 100 and a low frequency error encoder 102. In this embodiment alocal low band synthesis is obtained by using the quantized envelopeÊ(b) and a decoded low band fine structure {circumflex over (X)}_(L)(b)which is extracted from the fine structure encoder. It may also bepossible to run the full fine structure decoder to extract {circumflexover (X)}_(L)(b) from the indices I_(X), but a local synthesis can ingeneral be extracted from the encoder with less computationalcomplexity. A locally synthesized low band spectrum Ŷ_(L)(b) isgenerated by shaping the decoded low band structure with the quantizedenvelope:

Ŷ _(L)(b)={circumflex over (X)} _(L)(b)·Ê(b),b=1,2, . . . b _(max)  (39)

The low band spectrum of the input signal Y_(L)(b) is extracted from thefull spectrum by finding the last quantized band using the bitallocation R(b). A low band error signal is formed as the log ratio ofthe input signal energy and the Euclidean distance between thesynthesized low band spectrum from the input low band spectrum, i.e. asignal-to-noise ratio (SNR) measure D_(L) on the low band synthesisdefined as:

$\begin{matrix}{D_{L} = {10{\log_{10}\left( \frac{Y_{L}Y_{L}^{T}}{\left( {Y_{L} - {\hat{Y}}_{L}} \right)\left( {Y_{L} - {\hat{Y}}_{L}} \right)^{T}} \right)}}} & (40)\end{matrix}$

The low band SNR is quantized and the quantization indices I_(ERR) aremultiplexed together with the envelope indices I_(E) and the finestructure indices I_(X) to be stored or transmitted to a decoder. Thelow SNR encoding may be done e.g. using a uniform scalar quantizer.

The decoder 200 is similar to the decoder outlined in FIG. 9, butfurther has a combined control of a high band excitation compressionwhich is jointly controlled with a spectral envelope expander as shownin FIG. 10. As in the time domain embodiments, a control parameterfε[0,1] is used for steering both the compressor and the expander.

Using the control parameter f the strength of the high band excitationcompressor is adapted in accordance with:

$\begin{matrix}{H = \left( \frac{\max\left( {{\hat{X}}_{H}} \right)}{{\hat{X}}_{H}} \right)^{\eta + {\Delta \; {\eta \cdot f}}}} & (41)\end{matrix}$

where Δη gives the maximum compression factor η+Δη when f=1. If η=0.5then a suitable value for Δη may be Δη=0.3 or in the rangeΔηε[0.01,1−η]. Note that η+Δη≦1. The compressed high band excitation isobtained by the element-wise multiplication of H and {circumflex over(X)}_(H) in accordance with:

{tilde over (X)} _(H) =H diag({circumflex over (X)} _(H))  (42)

The expander used on the high band envelope has a similar structure asthe high band excitation compressor:

$\begin{matrix}{{G = \left( \frac{\max \left( {\hat{E}(b)} \right)}{\hat{E}(b)} \right)^{- {({\varphi + {\Delta \; {\varphi \cdot f}}})}}},{b = {b_{\max} + 1}},{b_{\max} + 2},\ldots \mspace{14mu},N_{b}} & (43)\end{matrix}$

Here the absolute value |·| may be omitted since the envelopecoefficients Ê(b)≧0. For f=0 the expander will have minimum effect withthe expansion coefficient φ. A suitable value for φ may be φ=0, sincethis would give an unaffected envelope for f=0. If a small expansioneffect is always desirable, suitable values may for instance be chosenfrom the range φε[0,0.5]. The maximum expansion is obtained for f=1,which gives the expansion factor exponent −(φ+Δφ). The value for Δφ maybe set to Δφ=1 but the suitable value would depend heavily on the bandstructure and may be chosen from a wide range, e.g. Δφε[0.5,10]. Theexpanded envelope {tilde over (E)}(b) is obtained by element-wisemultiplication of the envelope with the expansion function G, i.e.

{tilde over (E)} _(H) =Gdiag(Ê _(H))  (44)

where Ê_(H) represents elements the high band envelopeÊ_(H)=[Ê(b_(max)+1) Ê(b_(max)+2) . . . Ê(N_(b))]. The expanded envelopeis applied to the compressed high band fine structure {tilde over(X)}_(H)(b) to form the high band spectrum Ŷ_(H)(b) in accordance with:

Ŷ _(H)(b)={tilde over (X)} _(H)(b)·{tilde over (E)}(b),b=b _(max)+1,b_(max)+2, . . . ,N _(b)  (45)

The synthesized low band spectrum Ŷ_(L)(b) and the synthesized high bandspectrum Ŷ_(H)(b) are combined in the spectrum combiner to form thesynthesis spectrum Ŷ which is input to the inverse frequency transformerto form the output signal ŷ.

In this embodiment the control parameter f is based on the low band SNRfrom the encoder analysis. First, a reconstructed low band SNR{circumflex over (D)}_(L) is obtained from the low band error indexI_(ERR). The reconstructed low band SNR is mapped to a control parameterf using a piece-wise linear function:

$\begin{matrix}{f = \left\{ \begin{matrix}{0,} & {{\hat{D}}_{L} < D_{\min}} \\{{\left( {{\hat{D}}_{L} - D_{\min}} \right)/\left( {D_{\max} - D_{\min}} \right)},} & {D_{\min} \leq {\hat{D}}_{L} \leq D_{\max}} \\{1,} & {{\hat{D}}_{L} > D_{\max}}\end{matrix} \right.} & (46)\end{matrix}$

where the constants D_(min) and D_(max) depend on the typical low banddistortion values for this system. A suitable value for D_(min) may beD_(min)=10 or any value in the range D_(min)ε[5,20], while suitablevalues for D_(max) may be D_(max)=20 or in the range D_(max)ε[10,50].This relation will give stronger modification for high SNR values,corresponding to low distortion in the low band. It may also bedesirable to have the opposite relation, such that strong modificationwould be used for low SNRs (high distortion values). Such a relation maybe obtained by reversing the relation described above, i.e.:

$\begin{matrix}{f = \left\{ \begin{matrix}{1,} & {{\hat{D}}_{L} < D_{\min}} \\{{\left( {D_{\max} - {\hat{D}}_{L}} \right)/\left( {D_{\max} - D_{\min}} \right)},} & {D_{\min} \leq {\hat{D}}_{L} \leq D_{\max}} \\{0,} & {{\hat{D}}_{L} > D_{\max}}\end{matrix} \right.} & (47)\end{matrix}$

It shall be noted that the compressor and expander function may changethe overall energy of the vectors. Preferably the energy should be keptstable and there are many available methods for handling this. Onepossible solution is to measure the energy before and after themodification and restore the energy to the value before compression orexpansion. The energy measurement may also be limited to a certain bandor to the higher energy regions of the spectrum, allowing energy loss inthe valleys of the spectrum. In this exemplary embodiment it is assumedthat some energy compensation is used and that it is an integral part ofthe compressor and expander functions.

The steps, functions, procedures and/or blocks described herein may beimplemented in hardware using any conventional technology, such asdiscrete circuit or integrated circuit technology, including bothgeneral-purpose electronic circuitry and application-specific circuitry.

Alternatively, at least some of the steps, functions, procedures and/orblocks described herein may be implemented in software for execution bysuitable processing equipment. This equipment may include, for example,one or several micro processors, one or several Digital SignalProcessors (DSP), one or several Application Specific IntegratedCircuits (ASIC), video accelerated hardware or one or several suitableprogrammable logic devices, such as Field Programmable Gate Arrays(FPGA). Combinations of such processing elements are also feasible.

It should also be understood that it may be possible to reuse thegeneral processing capabilities already present in the encoder/decoder.This may, for example, be done by reprogramming of the existing softwareor by adding new software components.

FIG. 13 illustrates an example embodiment of a control arrangement. Thisembodiment is based on a processor 210, for example a micro processor,which executes software 220 for jointly controlling the envelope shapeand the excitation noisiness with a common control parameter. Thesoftware is stored in memory 230. The processor 210 communicates withthe memory over a system bus. The input signals are received by aninput/output (I/O) controller 240 controlling an I/O bus, to which theprocessor 210 and the memory 230 are connected. The output signalsobtained from the software 220 are outputted from the memory 230 by theI/O controller 240 over the I/O bus. The input and output signals inparenthesis correspond to the time domain BWE and the input and outputsignals without parenthesis correspond to the frequency domain BWE.

An embodiment based on a measure co of spectral flatness may bestructurally configured as in FIG. 13 with a processor, memory, systembus, I/O bys and I/O controller.

The technology described above is intended to be used in an audioencoder/decoder, which can be used in a mobile device (e.g. mobilephone, laptop) or a stationary device, such as a personal computer. Herethe term User Equipment (UE) will be used as a generic name for suchdevices. FIG. 14 illustrates a UE including a decoder provided with acontrol arrangement. A radio signal received by a radio unit 300 isconverted to baseband, channel decoded and forwarded to an audio decoder200. The audio decoder is provided with a control arrangement 310operating in the time or frequency domain as described above. Thedecoded and bandwidth extended audio samples are forwarded to a D/Aconversion and amplification unit 320, which forwards the final audiosignal to a loudspeaker 330.

FIG. 15 is a flow chart illustrating the proposed technology. Step S1jointly controls the envelope shape and the excitation noisiness with acommon control parameter f.

FIG. 16 is a flow chart illustrating an example embodiment of theproposed technology. In this embodiment step S1 includes a step S1Acontrolling the envelope shape by using a formant post-filter H(z), forexample having the form defined by equation (6). The predeterminedconstants γ₁, γ₂ may, for example, be determined in accordance with oneof the equations (7)-(10).

FIG. 17 is a flow chart illustrating an embodiment of the proposedtechnology. In this embodiment step S1 includes a step S1B controllingthe excitation noisiness by mixing a high band excitation x_(H,i) of asubframe i with noise n_(i) in accordance with equation (1), where themixing factors g_(x)(i) and g_(n)(i) are defined by, for example,equation (11) or (12), depending on the choice of predeterminedconstants γ₁, γ₂.

FIG. 18 is a flow chart illustrating an embodiment of the proposedtechnology. In this embodiment step S1 includes a step S1C adapting thecontrol parameter f to a high band spectral tilt t_(m) of frame m, forexample in accordance with equation (18). In one embodiment the highband spectral tilt t_(m) may be approximated using the secondcoefficient a_(1,m) of the decoded linear predictor filter Â_(m){1,a_(1,m), a_(2,m), . . . , a_(P,m)} of frame m, where P is the filterorder. It is generally also beneficial to smoothen the high bandspectral tilt t_(m), for example in accordance with one of the equations(13), (15)-(17). An embodiment based on a measure φ of spectral flatnessmay perform step S1C using the approach described with reference toequations (19)-(22)

FIG. 19 is a flow chart illustrating an embodiment of the proposedtechnology. This embodiment combines the described steps S1A, S1B, S1C.Typically the control parameter f is determined first. It is then usedto perform steps S1A and S1B. Other combinations including S1A+S1C orS1B+S1C are also possible.

It will be understood by those skilled in the art that variousmodifications and changes may be made to the proposed technology withoutdeparture from the scope thereof, which is defined by the appendedclaims.

ABBREVIATIONS ASIC Application Specific Integrated Circuit BWE BandwidthExtension CELP Code Excited Linear Predictor DCT Discrete CosineTransform DFT Discrete Fourier Transform DSP Digital Signal ProcessorFFT Fast-Fourier Transform FPGA Field Programmable Gate Arrays HF HighFrequency LF Low Frequency LP Linear Predictor LPC Linear PredictiveCoding MDCT Modified Discrete Cosine Transform QMF Quadrature MirrorFilter SBR Spectral Band Replication SNR Signal-to-Noise Ratio

TCX Transform coded residual

UE User Equipment REFERENCES

-   [1] “AMR-WB+: A new audio coding standard for 3rd generation mobile    audio services”, J. Mäkinen, B. Bessette, S. Bruhn, P. Ojala, R.    Salami, A. Taleb, ICASSP 2005-   [2] “Enhanced aacPlus encoder Spectral Band Replication (SBR) part”,    3GPP TS 26.404 V10.0.0 (2011-03), sections 5.6.1-5.6.3, pp. 22-25.

1. A method of generating a high band extension of an audio signal froman envelope and an excitation, wherein the method includes the step (S1)of jointly controlling envelope shape and excitation noisiness with acommon control parameter (f).
 2. The method of claim 1, including thestep of controlling (S1A) the envelope shape by using a formantpost-filter H(z) of the form:${H(z)} = \frac{\hat{A}\left( {z/\gamma_{1}} \right)}{\hat{A}\left( {z/\gamma_{2}} \right)}$where Â is a linear predictor filter representing the envelope, and γ₁,γ₂ are functions of the control parameter f.
 3. The method of claim 2,wherein $\quad\left\{ \begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma}}}\end{matrix} \right.$ where γ₀, Δγ are predetermined constants.
 4. Themethod of any of the preceding claims, including the step of controlling(S1B) the excitation noisiness by mixing a high band excitation x_(H,i)of a subframe i with noise n_(i) in accordance with:{tilde over (x)} _(i) =g _(x)(i)x _(H,i) +g _(n)(i)n _(i) where themixing factors g_(x)(i) and g_(n)(i) are defined by:$\quad\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{{v(i)}\left( {1 - {\alpha \; f}} \right)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {{v(i)}\left( {1 - {\alpha \; f}} \right)}} \right)}/E_{2}}}\end{matrix} \right.$ where ν(i) is a voicing parameter partiallycontrolling the excitation noisiness, α is a predetermined tuningconstant, E₁ is the frame energy of the high band excitations x_(H,i)for all subframes i, and E₂ is the frame energy of the noise n_(i) forall subframes i.
 5. The method of claim 2, wherein$\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma_{sharp}}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma_{sharp}}}}\end{matrix},{f \geq {0\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma_{flat}}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma_{flat}}}}\end{matrix},{f < 0}} \right.}}} \right.$ where γ₀, Δγ_(flat) andΔγ_(sharp) are predetermined constants.
 6. The method of claim 5,including the step of controlling (S1B) the excitation noisiness bymixing a high band excitation x_(H,i) of a subframe i with noise n_(i)in accordance with:{tilde over (x)} _(i) =g _(x)(i)x _(H,i) +g _(n)(i)n _(i) where themixing factors g_(x)(i) and g_(n)(i) are defined by:$\quad\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{{v(i)}\left( {1 - {\max \left( {0,{\alpha \; f}} \right)}} \right)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {{v(i)}\left( {1 - {\max \left( {0,{\alpha \; f}} \right)}} \right)}} \right)}/E_{2}}}\end{matrix} \right.$ where ν(i) is a voicing parameter partiallycontrolling the excitation noisiness, α is a predetermined tuningconstant, E₁ is the frame energy of the high band excitations x_(H,i)for all subframes i, and E₂ is the frame energy of the noise n_(i) forall subframes i.
 7. The method of any of the preceding claims 2-6,including the step of adapting (S1C) the control parameter f to a highband spectral tilt t_(m) of frame m.
 8. The method of claim 7, whereinthe control parameter f depends on the high band spectral tilt t_(m) inaccordance with: ${f\left( t_{m} \right)} = \left\{ \begin{matrix}{0,} & {t_{m} \geq C_{\max}} \\{{1 - {\left( {t_{m} - C_{\min}} \right)/\left( {C_{\max} - C_{\min}} \right)}},} & {C_{\min} \leq t_{m} < C_{\max}} \\{1,} & {t_{m} < C_{\min}}\end{matrix} \right.$ where C_(min) and C_(max) are predeterminedconstants.
 9. The method of claim 7 or 8, wherein the high band spectraltilt t_(m) is approximated using the second coefficient a_(1,m) of thedecoded linear predictor filter Â_(m)={1, a_(1,m), a_(2,m), . . . ,a_(P,m)} of frame m, where P is the filter order.
 10. The method ofclaim 9, whereint _(m)=β·max(0,a _(1,m))+(1−β)t _(m-1) where t_(m) is the spectral tiltvalue of frame m, t_(m-1) is the spectral tilt value of the previousframe m−1, and β is a constant in the range β=[0,0.5].
 11. The method ofany of the preceding claims 2-6, including the step of adapting thecontrol parameter f to a measure of spectral flatness (φ) of the highband.
 12. An audio decoder (200) configured to generate a high bandextension of an audio signal from an envelope and an excitation,including a control arrangement (41, 42, 44; 88, 90, 92; 310) configuredto jointly control envelope shape and excitation noisiness with a commoncontrol parameter (f).
 13. The decoder of claim 12, wherein the controlarrangement (41, 42, 44) includes a joint post-filter and excitationcontroller (44) configured to control the envelope shape by using aformant post-filter (42) H(z) of the form:${H(z)} = \frac{\hat{A}\left( {z/\gamma_{1}} \right)}{\hat{A}\left( {z/\gamma_{2}} \right)}$where Â is a linear predictor filter representing the envelope, and γ₁,γ₂ are functions of the control parameter f.
 14. The decoder of claim13, wherein $\quad\left\{ \begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma}}}\end{matrix} \right.$ where γ₀, Δγ are predetermined constants.
 15. Thedecoder of any of the preceding claims 12-14, including a mix controller(41) configured to control the excitation noisiness by mixing a highband excitation x_(H,i) of a subframe i with noise n_(i) in accordancewith:{tilde over (x)} _(i) =g _(x)(i)x _(H,i) +g _(n)(i)n _(i) where themixing factors g_(x)(i) and g_(n)(i) are defined by:$\quad\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{{v(i)}\left( {1 - {\alpha \; f}} \right)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {{v(i)}\left( {1 - {\alpha \; f}} \right)}} \right)}/E_{2}}}\end{matrix} \right.$ where ν(i) is a voicing parameter partiallycontrolling the excitation noisiness, α is a predetermined tuningconstant, E₁ is the frame energy of the high band excitations x_(H,i)for all subframes i, and E₂ is the frame energy of the noise n_(i) forall subframes i.
 16. The decoder of claim 13, wherein$\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma_{sharp}}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma_{sharp}}}}\end{matrix},{f \geq {0\left\{ {\begin{matrix}{\gamma_{1} = {\gamma_{0} + {{f \cdot \Delta}\; \gamma_{flat}}}} \\{\gamma_{2} = {\gamma_{0} - {{f \cdot \Delta}\; \gamma_{flat}}}}\end{matrix},{f < 0}} \right.}}} \right.$ where γ₀, Δγ_(flat) andΔγ_(sharp) are predetermined constants.
 17. The decoder of claim 16,including a mix controller (41) configured to control the excitationnoisiness by mixing a high band excitation x_(H,i) of a subframe i withnoise n_(i) in accordance with:{tilde over (x)} _(i) =g _(x)(i)x _(H,i) +g _(n)(i)n _(i) where themixing factors g_(x)(i) and g_(n)(i) are defined by:$\quad\left\{ \begin{matrix}{{g_{x}(i)} = \sqrt{{v(i)}\left( {1 - {\max \left( {0,{\alpha \; f}} \right)}} \right)}} \\{{g_{n}(i)} = \sqrt{{E_{1}\left( {1 - {{v(i)}\left( {1 - {\max \left( {0,{\alpha \; f}} \right)}} \right)}} \right)}/E_{2}}}\end{matrix} \right.$ where ν(i) is a voicing parameter partiallycontrolling the excitation noisiness, α is a predetermined tuningconstant, E₁ is the frame energy of the high band excitations X_(H,i)for all subframes i, and E₂ is the frame energy of the noise n_(i) forall subframes i.
 18. The decoder of any of the preceding claims 13-17,wherein the joint post-filter and excitation controller (44) isconfigured to adapt the control parameter f to a high band spectral tiltt_(m) of frame m.
 19. The decoder of claim 18, wherein the controlparameter f depends on the high band spectral tilt t_(m) in accordancewith: ${f\left( t_{m} \right)} = \left\{ \begin{matrix}{0,} & {t_{m} \geq C_{\max}} \\{{1 - {\left( {t_{m} - C_{\min}} \right)/\left( {C_{\max} - C_{\min}} \right)}},} & {C_{\min} \leq t_{m} < C_{\max}} \\{1,} & {t_{m} < C_{\min}}\end{matrix} \right.$ where C_(min) and C_(max) are predeterminedconstants.
 20. The decoder of claim 18 or 19, wherein the jointpost-filter and excitation controller (44) is configured to approximatethe high band spectral tilt t_(m) by using the second coefficienta_(1,m) of the decoded linear predictor filter Â_(m)={1, a_(1,m),a_(2,m), . . . , a_(P,m)} of frame m, where P is the filter order. 21.the decoder of claim 20, whereint _(m)=β·max(0,a _(1,m))+(1−β)t _(m-1) where t_(m) is the spectral tiltvalue of frame m, t_(m-1), is the spectral tilt value of the previousframe m−1, and β is a constant in the range β=[0,0.5].
 22. The decoderof any of the preceding claims 13-17, wherein the joint post-filter andexcitation controller (44) is configured to adapt the control parameterf to a measure of spectral flatness (φ) of the high band.
 23. A userequipment (UE) including an audio decoder in accordance with any of thepreceding claims 12-22.
 24. An audio encoder including a spectralflatness estimator (46) configured to determine, for transmission to adecoder (200), a measure of spectral flatness (φ) of a high band signal.